Cluster ensemble selection based on a new cluster stability measure

نویسندگان

  • Hosein Alizadeh
  • Behrouz Minaei-Bidgoli
  • Hamid Parvin
چکیده

Many stability measures, such as Normalized Mutual Information (NMI), have been proposed to validate a set of partitionings. It is highly possible that a set of partitionings may contain one (or more) high quality cluster(s) but is still adjudged a bad cluster by a stability measure, and as a result, is completely neglected. Inspired by evaluation approaches measuring the efficacy of a set of partitionings, researchers have tried to define new measures for evaluating a cluster. Thus far, the measures defined for assessing a cluster are entirely based on the well-known NMI measure. The drawback of this commonly used approach is discussed in this paper, after which a new asymmetric criterion, called the Alizadeh–Parvin– Moshki–Minaei criterion (APMM), is proposed to assess the association between a cluster and a set of partitionings. The APMM criterion overcomes the deficiency in the conventional NMI measure. We also propose a clustering ensemble framework that incorporates the APMM’s capabilities in order to find the best performing clusters. The framework uses Average APMM (AAPMM) as a fitness measure to select a number of clusters instead of using all of the results. Any cluster that satisfies a predefined threshold of the mentioned measure is selected to participate in an elite ensemble. To combine the chosen clusters, a co-association matrix-based consensus function (by which the set of resultant partitionings are obtained) is used. Because Evidence Accumulation Clustering (EAC) can not derive the co-association matrix from a subset of clusters, a new EAC-based method, called Extended EAC (EEAC), is employed to construct the co-association matrix from the chosen subset of clusters. Empirical studies show that our proposed approach outperforms other cluster ensemble approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

Hierarchical cluster ensemble selection

Clustering ensemble performance is affected by two main factors: diversity and quality. Selection of a subset of available ensemble members based on diversity and quality often leads to a more accurate ensemble solution. However, there is not a certain relationship between diversity and quality in selection of subset of ensemble members. This paper proposes the Hierarchical Cluster Ensemble Sel...

متن کامل

Moderate diversity for better cluster ensembles

Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2014